“The German Credit data has data on 1000 past credit applicants, described by 30 variables. Each applicant is rated as”Good” or “Bad” credit (encoded as 1 and 0 respectively in the response variable). We want to obtain a model that may be used to determine if new applicants present a good or bad credit risk.”
In this case, we are planning to use both CART model and Clustering to do the analysis.
Q: Shall we change all the variables which are not numerical into factor?
#> 'data.frame': 1000 obs. of 32 variables:
#> $ OBS. : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ CHK_ACCT : Factor w/ 4 levels "0","1","2","3": 1 2 4 1 1 ..
#> $ DURATION : int 6 48 12 42 24 36 24 36 12 30 ...
#> $ HISTORY : Factor w/ 5 levels "0","1","2","3",..: 5 3 5 3..
#> $ NEW_CAR : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 ..
#> $ USED_CAR : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 ..
#> $ FURNITURE : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 2 1 1 ..
#> $ RADIO.TV : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 2 ..
#> $ EDUCATION : Factor w/ 3 levels "-1","0","1": 2 2 3 2 2 3 2..
#> $ RETRAINING : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#> $ AMOUNT : int 1169 5951 2096 7882 4870 9055 2835 6948 3..
#> $ SAV_ACCT : Factor w/ 5 levels "0","1","2","3",..: 5 1 1 1..
#> $ EMPLOYMENT : Factor w/ 5 levels "0","1","2","3",..: 5 3 4 4..
#> $ INSTALL_RATE : int 4 2 2 2 3 2 3 2 2 4 ...
#> $ MALE_DIV : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 2 ..
#> $ MALE_SINGLE : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 1 ..
#> $ MALE_MAR_or_WID : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#> $ CO.APPLICANT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#> $ GUARANTOR : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 ..
#> $ PRESENT_RESIDENT: Factor w/ 4 levels "1","2","3","4": 4 2 3 4 4 ..
#> $ REAL_ESTATE : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 2 ..
#> $ PROP_UNKN_NONE : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 1 1 1 ..
#> $ AGE : int 67 22 49 45 53 35 53 35 61 28 ...
#> $ OTHER_INSTALL : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#> $ RENT : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 ..
#> $ OWN_RES : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 2 1 2 ..
#> $ NUM_CREDITS : int 2 1 1 1 2 1 1 1 1 2 ...
#> $ JOB : Factor w/ 4 levels "0","1","2","3": 3 3 2 3 3 ..
#> $ NUM_DEPENDENTS : int 1 1 2 2 2 2 1 1 1 1 ...
#> $ TELEPHONE : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 2 1 ..
#> $ FOREIGN : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 ..
#> $ RESPONSE : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 ..
#> OBS. CHK_ACCT DURATION HISTORY NEW_CAR USED_CAR
#> Min. : 1 0:274 Min. : 4.0 0: 40 0:766 0:897
#> 1st Qu.: 251 1:269 1st Qu.:12.0 1: 49 1:234 1:103
#> Median : 500 2: 63 Median :18.0 2:530
#> Mean : 500 3:394 Mean :20.9 3: 88
#> 3rd Qu.: 750 3rd Qu.:24.0 4:293
#> Max. :1000 Max. :72.0
#> FURNITURE RADIO.TV EDUCATION RETRAINING AMOUNT SAV_ACCT
#> 0:819 0:720 -1: 1 0:903 Min. : 250 0:603
#> 1:181 1:280 0 :950 1: 97 1st Qu.: 1366 1:103
#> 1 : 49 Median : 2320 2: 63
#> Mean : 3271 3: 48
#> 3rd Qu.: 3972 4:183
#> Max. :18424
#> EMPLOYMENT INSTALL_RATE MALE_DIV MALE_SINGLE MALE_MAR_or_WID
#> 0: 62 Min. :1.00 0:950 0:452 0:908
#> 1:172 1st Qu.:2.00 1: 50 1:548 1: 92
#> 2:339 Median :3.00
#> 3:174 Mean :2.97
#> 4:253 3rd Qu.:4.00
#> Max. :4.00
#> CO.APPLICANT GUARANTOR PRESENT_RESIDENT REAL_ESTATE PROP_UNKN_NONE
#> 0:959 0:948 1:130 0:718 0:846
#> 1: 41 1: 51 2:308 1:282 1:154
#> 2: 1 3:149
#> 4:413
#>
#>
#> AGE OTHER_INSTALL RENT OWN_RES NUM_CREDITS
#> Min. : 19.0 0:814 0:821 0:287 Min. :1.00
#> 1st Qu.: 27.0 1:186 1:179 1:713 1st Qu.:1.00
#> Median : 33.0 Median :1.00
#> Mean : 35.6 Mean :1.41
#> 3rd Qu.: 42.0 3rd Qu.:2.00
#> Max. :125.0 Max. :4.00
#> JOB NUM_DEPENDENTS TELEPHONE FOREIGN RESPONSE
#> 0: 22 Min. :1.00 0:596 0:963 0:300
#> 1:200 1st Qu.:1.00 1:404 1: 37 1:700
#> 2:630 Median :1.00
#> 3:148 Mean :1.16
#> 3rd Qu.:1.00
#> Max. :2.00
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | OBS. [integer] |
|
1000 distinct values (Integer sequence) | 1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 2 | CHK_ACCT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 3 | DURATION [integer] |
|
33 distinct values | 1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 4 | HISTORY [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 5 | NEW_CAR [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 6 | USED_CAR [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 7 | FURNITURE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 8 | RADIO.TV [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 9 | EDUCATION [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 10 | RETRAINING [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 11 | AMOUNT [integer] |
|
921 distinct values | 1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 12 | SAV_ACCT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 13 | EMPLOYMENT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 14 | INSTALL_RATE [integer] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 15 | MALE_DIV [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 16 | MALE_SINGLE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 17 | MALE_MAR_or_WID [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 18 | CO.APPLICANT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 19 | GUARANTOR [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 20 | PRESENT_RESIDENT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 21 | REAL_ESTATE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 22 | PROP_UNKN_NONE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 23 | AGE [integer] |
|
54 distinct values | 1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 24 | OTHER_INSTALL [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 25 | RENT [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 26 | OWN_RES [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 27 | NUM_CREDITS [integer] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 28 | JOB [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 29 | NUM_DEPENDENTS [integer] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 30 | TELEPHONE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 31 | FOREIGN [factor] |
|
|
1000 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||
| 32 | RESPONSE [factor] |
|
|
1000 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.0.0 (R version 4.1.2)
2022-04-28
In this part, we would like to draw the profilling of observations
ditinguished by different response. Thereby, it is possible to observe
the performance / situation of observations subjects with good credit in
each variable through the portrait.
At the beginning, we want to use boxplot as above, but
it is not suitable for binary variables